home *** CD-ROM | disk | FTP | other *** search
-
- Detecting oh, roughly every polymorphic engine out there.
- By Rhincewind [Vlad]
-
-
- This article assumes that the reader has basic knowledge of polymorphic
- engines, mathematical reasoning and logics. I suggest to those who lack
- the first to read Dark Angel's fine two-part 'polymorphism primer',
- published in issues 11 and 12 of the Phalcon/Skism publication 40hex.
-
- ---- Definitions and terminology, and some general bits.
-
- The ideal polymorphic engine is popularly defined as a program that can
- transform a given block of code into another block of code that will have the
- exact same functionality and that any number of transformed blocks, all
- different instances of the same original code, share no predictable patterns
- as the presence of such patterns can be used to deduce the presence of
- encrypted code the engine is meant to hide.
-
- The most commonly used method by far to approximate this ideal is to apply an
- encryption loop on the given code, and prepend a program to the
- encrypted block of code that will decrypt it, and then pass on execution
- as if the program hadn't been encrypted at all.
-
- To get anywhere close to the ideal defined above, the prepended program
- most of course also vary, and this is where the real benefit of this
- approach comes in. Instead of having to vary a large block of code,
- only a decryption loop needs to be varied, typically having a size of
- just a little over 10 bytes. The advantages should be obvious.
-
- The disadvantages are a little less obvious and they are the main topic
- of this article. The decryption algorithm is kept as simple
- as possible as the complexity of the decryptor has a direct relation to the
- complexity of the code needed to succesfully vary it. Result? Encryption
- schemes even Caesar would've been ashamed of using: usually one round of
- operation, transforming each data unit U to it's crypttext equivalent U'.
- Yes transforming, it really isn't that much like encryption at all but alas,
- terminology is easier when this is regarded as encryption.
-
- As said, unit per unit, typically byte- or word-length, code is transformed.
- The detransformation procedure merely performs the reciprocal operation,
- et voila. Typical transformational operations are ADD, SUB, XOR and bit
- rotations. All these are binary operators, and, viewed as
- encryption can be said to use a key K. U' = U + K, or U' = U - K or
- U' = U XOR K.
-
- ---- How to detect? And a small bit o' maths.
-
- An engine would be undetectable if 1) the decryptor is indistinguishable from
- programs without a decrypting function and 2) the encrypted code is
- indistinguishable from random data. Of course it would help if the decryptor
- were like random data but alas, the restrictions imposed on programs by
- the processor's instruction set do not permit this.
-
- Decryptor attacks come in a few flavors but are basically all the same.
- They see if a fragment of code could have been generated by the engine
- the attack is looking for. This is obviously a flawed attack as parts
- of a legitimate program may very well have been generated by an engine.
- Hiding polymorphically generated code in the middle of a legitimate
- program makes flawless detection of this type virtually impossible,
- see Commander Bomber.
-
- The most foolish of decryptor attacks traces the decrypting code, looking
- for processor opcodes the engine searched for can't generate. Prepending
- one or two instructions foreign to the engine will throw off this attack.
- Quite a few of the lesser anti-virus programs can be fooled by this.
-
- A more sophisticated, but related attack involves statistical analysis.
- Perhaps it's worth a description in another issue. Votes please!
-
- No, the real attacks of the moment are different. They focus on the encrypted
- code. There may not be a predictable unit sequence, but the units have
- distinct mathematical relations to eachother as all were transformed using
- the same operation. Exploiting this is not hard at all if the reciprocal
- operator is known.
-
- It is possible to transform corresponding parts of crypt- and plaintext
- independently into a common form. To use this for detection you assume a
- string of units to be an encrypted version of a string of plaintext,
- compute what should be their common form, once from the plaintext, once
- from the crypttext, and see if it's a match. The manipulations needed to
- get to this common form are different for each encryption type, but are for
- single loops easily obtained as follows:
-
- A' = A OP K and
- B' = B OP K where A and B are plaintext (unencrypted) data units,
- and A' and B' form their crypttext.
-
- Now we can derive:
-
- A' NegOP B' ==
- (A OP K) NegOP (B OP K) ==
- A OP K NegOP B NegOP K ==
- A NegOP B.
-
- Yes, A' NegOP B' == A NegOP B!
-
- To illustrate this: A' XOR B' = A XOR B if the code was encrypted with
- a XOR loop. You XOR consecutive unitpairs of crypttext, and of your
- plaintext signature and check the results. Pairs meaning
- D[n] and D[n+1], followed by D[n+1] and D[n+2], with D as a unit-length array
- of data.
-
- Of course this also works for a bit more complex loops, such as an addition
- loop where K is increased by one every iteration, and one unit is transformed
- per iteration.
-
- A' = A + K
- B' = B + K + 1
-
- A' - B' ==
- (A + K) - (B + K + 1) ==
- A + K - B - K - 1 ==
- A - B - 1.
-
- A' - B' == A - B - 1
-
- Now you compute A' - B' and A - B - 1 and compare the results. As you can
- see, the operations needed to transform crypt- and plaintext to a common
- form need not be equal, as long as both results are.
-
- This attack is called cryptanalysis; you check if a string of units
- could be an encrypted version of a plaintext string. False positive error
- chance drops rapidly as signaturelength increases following
- 1/2^(unit_bitlen*unit_siglen).
-
- Now try another one. A XOR loop where the loop counter
- (=units left to transform) is added to the key K each iteration.
-
- A' = A XOR K
- B' = A XOR (K + C)
-
- A' XOR B' ==
- (A XOR K) XOR (A XOR (K+C)) ==
- ??? We can't eliminate K!
-
- If you can't eliminate K, then a common form cannot be computed from A and B.
- The solution is to recover K using a cryptological known plaintext attack.
- This kind of attack happens to be ridiculously easy on 'ciphers' (sic) like
- these.
-
- K = A' XOR A
- K + C = B' XOR B
-
- Now we have K. This is need not be the K of the first byte encrypted, just
- start somewhere in the middle of call it THE K, like the pioneers who
- coincidentally found some land and decided to call it theirs. If it were
- an easier loop we could take K and decrypt all of the code, but alas,
- now we must first recover the counter C.
-
- K + C - K = C == B' XOR B - A' XOR A = C
-
- Now we can decrypt the Nth unit, A being unit 0,
- by U=U' XOR (K+N*C) and comparing U against the unit in our plaintext
- signature.
-
- This attack is usually also referred to as being cryptanalytic, but
- technically it falls in the domain of cryptology as you actually decrypt
- what is encrypted. For this type it is vital that you operate on
- pairs of consecutive units only. With most cryptanalytic attacks,
- you may just as well muddle about.
-
- ---- The limit of these attacks.
-
- Nearly all polymorphic engines to date can be caught using these simple, what
- I like to call 'crypto-sweeps'. Things get a bit more complex when multiple
- layer encryption is used. Let's take an engine that encrypts each unit
- using an ADD and a XOR:
-
- A' = (A+K1) + K2
- B' = (B+K1) + K2
-
- Looks easy:
-
- A' XOR B' == (A+K1) XOR (B+K1)
-
- But now you need to recover or eliminate K1 which is not at all easy with the
- XOR in the middle. With even more layers, it becomes practically impossible
- to decrypt and the weight is pushed back to the complexity of the decryptor.
-
- All in all I'd say that what we've seen so far is just the start of
- polymorphism. Territory to be conquered, things unthought of to be thought
- of, and so on.
-
- Rhince.
-